Comparative transcriptome analysis of Veratrum maackii and Veratrum nigrum reveals multiple candidate genes involved in steroidal alkaloid biosynthesis

Veratrum (Melanthiaceae; Liliales) is a genus of perennial herbs known for the production of unique bioactive steroidal alkaloids. However, the biosynthesis of these compounds is incompletely understood because many of the downstream enzymatic steps have yet to be resolved. RNA-Seq is a powerful method that can be used to identify candidate genes involved in metabolic pathways by comparing the transcriptomes of metabolically active tissues to controls lacking the pathway of interest. The root and leaf transcriptomes of wild Veratrum maackii and Veratrum nigrum plants were sequenced and 437,820 clean reads were assembled into 203,912 unigenes, 47.67% of which were annotated. We identified 235 differentially expressed unigenes potentially involved in the synthesis of steroidal alkaloids. Twenty unigenes, including new candidate cytochrome P450 monooxygenases and transcription factors, were selected for validation by quantitative real-time PCR. Most candidate genes were expressed at higher levels in roots than leaves but showed a consistent profile across both species. Among the 20 unigenes putatively involved in the synthesis of steroidal alkaloids, 14 were already known. We identified three new CYP450 candidates (CYP76A2, CYP76B6 and CYP76AH1) and three new transcription factor candidates (ERF1A, bHLH13 and bHLH66). We propose that ERF1A, CYP90G1-1 and CYP76AH1 are specifically involved in the key steps of steroidal alkaloid biosynthesis in V. maackii roots. Our data represent the first cross-species analysis of steroidal alkaloid biosynthesis in the genus Veratrum and indicate that the metabolic properties of V. maackii and V. nigrum are broadly conserved despite their distinct alkaloid profiles.

Identification and analysis of differentially expressed genes. The transcriptomic comparison of two tissues in two species provides an opportunity to define the components of a tissue-specific metabolic pathway but also to evaluate differences in gene expression between species that lead to different metabolic profiles, which was not possible in earlier studies involving a single Veratrum species 10,19 . We found 3534 differentially expressed genes (DEGs) when comparing V. maackii roots vs V. nigrum roots (MR vs NR), 2004 upregulated and 1530 downregulated in V. maackii. Importantly 1019 transcripts were only present in MR and another 814 only in NR, indicating qualitative as well as quantitative differences in the cross-species transcriptomic comparison. We also found 5734 DEGs when comparing V. maackii leaves vs V. nigrum leaves (ML vs NL), 2919 upregulated and 2815 downregulated in V. maackii. As above, 1090 transcripts were only present in ML and another 902 only in NL.
The within-species comparison of tissues revealed 3269 DEGs when comparing V. maackii roots and leaves (MR vs ML), 1593 upregulated and 1676 downregulated in the roots, including 862 transcripts that were only present in MR and another 339 only present in ML. Similarly, the NR vs NL comparison in V. nigrum revealed 1122 DEGs, 263 upregulated and 859 downregulated in the roots, including 75 transcripts that were only present in NR and 124 only present in ML (Fig. 4). Interestingly, zero transcripts were shared between all four sample types. Functional enrichment of differentially expressed genes. We examined the GO and KEGG annotations of the DEGs to determine any species-dependent or tissue-specific differences in gene functions. The GO biological process category "metabolic process" (GO:0008152) was more enriched in V. maackii than in V. nigrum. The GO biological process categories "oxidation-reduction process" (GO:0055114) and "rRNA processing" (GO:0006364), the GO cellular components "integral component of membrane" (GO:0016021), "membrane" (GO:0016020) and "chloroplast" (GO:0009507), and the molecular functions "ATP binding" (GO:0005524), "metal ion binding" (GO:0046872) and "oxidoreductase activity" (GO:0016491) were enriched in the leaves of both species but not in the roots. We observed major differences between leaves and roots in multiple GO categories (Supplementary Table S2).   Table S3). Approximately 100 pathways were enriched in the MR vs NR, ML vs NL and MR vs ML comparisons, but only 50 in NR vs NL. These involved 263, 329, 499 and 185 unigenes, respectively. In MR vs NR, "glycolysis/ gluconeogenesis", "purine metabolism" and "oxidative phosphorylation" were the most enriched pathways, each with 18 unigenes. In ML vs NL, "ribosome", "purine metabolism" and "glycolysis/gluconeogenesis", were the most enriched pathways, with 22, 20 and 19 unigenes, respectively. In MR vs ML, the most enriched categories were "photosynthesis", "carbon fixation in photosynthetic organisms" and "phenylpropanoid biosynthesis", with 58, 38 and 32 unigenes, respectively. Similarly, in NR vs NL, the most enriched categories were "photosynthesis", "photosynthesis-antenna proteins" and "carbon fixation in photosynthetic organisms", with 46, 24 and 22 unigenes, respectively. In the highly relevant category of "steroid biosynthesis", the largest number of unigenes (five) was found in the MR vs ML comparison, whereas the differences in the MR vs NR and ML vs NL comparisons were insignificant (two unigenes). These data suggest there is only a small difference between species in terms of steroidal alkaloid metabolism but a large difference between leaf and root tissues.
Identification of regulatory genes. One of the benefits of RNA-Seq analysis in the context of metabolic pathways is that differences in gene expression can identify not only enzymes involved directly in the metabolic pathway but also the regulatory proteins-especially transcription factors (TFs)-that control it. We were able to assign 867 unigenes to 16 known plant TF gene families, including 168 (19.36%) belonging to the MYB family, 152 (17.55%) to the bZIP family, 126 (14.56%) to the bHLH family, 94 (10.85%) to the ERF family and 92 (10.62%) to the C2H2 family (Fig. 5). Importantly, 234 of the TF genes were differentially expressed, including 131 when comparing MR vs ML and 37 when comparing NR vs NL, with the majority in both cases representing the MYB family ( Supplementary Fig. S3). Another 42 TF genes differed when comparing MR vs NR, with the majority representing the C2H2 family, followed by MYB, AP2/ERF and bHLH. Furthermore, 77 TF genes differed when comparing ML vs NL, with the majority representing the zinc finger family, followed by bHLH, WRKY, NAC, MYB, ARF, GATA, bZIP and C2H2 ( Supplementary Fig. S4). All differentially expressed bZIP, C2H2 and bHLH TF genes were upregulated in roots compared to the leaves in both species ( Supplementary  Fig. S4).
Identification of CYP450 genes. We identified 444 unigenes annotated as CYP450s, 139 of which could not be assigned to a family. The rest were assigned to nine clans and 41 families (Supplementary Table S5; Supplementary Table S6). The most abundant families were CYP71 and CYP76, together accounting for more than half of the total, which is consistent with the distribution of CYPs found in other plant species 20 . We found that 153 of the CYP450 unigenes were differentially expressed, including the previously reported CYP90B27, CYP90G1 (two unigenes) and CYP94N that may be involved in cyclopamine synthesis 19 . The CYP450 unigenes were expressed at higher levels in the roots than the leaves in both species.
Validation of candidate genes by RT-qPCR. We selected 20 candidate genes from among the DEGs encoding enzymes and TFs associated with the steroidal alkaloid pathway and validated their expression profiles by RT-qPCR. All 10 candidate genes encoding enzymes in the upstream part of the steroidal alkaloid pathway were expressed at higher levels in the roots than the leaves of both species (  Similarly, all seven CYP450 candidate genes were expressed more strongly in roots than leaves: CYP90G1 (two unigenes), CYP90B27, CYP94N, CYP76A2, CYP76B6 and CYP76AH1 (Fig. 7). Finally, among the three TF genes we tested, ERF1A and bHLH66 were expressed more strongly in the roots whereas bHLH13 was expressed more strongly in the leaves, suggesting the latter may be a negative regulator (Fig. 7). The RT-qPCR results were consistent with the RNA-Seq data for most candidate genes, except that MVD was expressed at a higher level in NL than NR and CAS2 was expressed at a higher level in ML than MR.

Discussion
Differential accumulation of steroidal alkaloids. Veratrum plants produce a rich variety of steroidal alkaloids many of which are of pharmaceutical interest, and the alkaloid profiles of different species are unique. For example, the steroidal alkaloid profile of V. californicum includes the significant accumulation of cyclopamine in rhizomes, followed by roots and bulbs, resulting in a 20-fold difference between underground and aerial organs 19 . This suggests that steroidal alkaloids in V. californicum are synthesized in underground organs and transported to other parts of the plant. The comparison of V. californicum roots and leaves indicated a 20-fold to 100-fold higher content of cyclopamine in the roots 21,22 , but this varies considerably with harvest location and growth stage 22 . Similarly, V. nigrum roots were found to accumulate twice as much jervine as the leaves 23 .
As the basis for our RNA-Seq experiments, we extended this analysis to include multiple alkaloids and multiple species. We found that V. nigrum and V. maackii showed opposing profiles for the accumulation of jervine and cyclopamine, with jervine accumulating to much higher levels in V. maackii roots compared to leaves and to the roots and leaves of V. nigrum, whereas cyclopamine was more abundant in the roots of V. nigrum compared to V. maackii and was not detected in the leaves of either species. These results show that V. nigrum and V. maackii are good models for the analysis of interspecific differences in alkaloid accumulation, which could be due to various factors including differential gene expression, protein synthesis/stability, enzyme activity and or product turnover. As mentioned above, the accumulation of alkaloids is influenced by environmental factors and varies with growth stage and season. In most studies the harvesting season of the studied material is not mentioned but our study confirms the pattern observed for cyclopamine in V. californicum where the underground organs generally accumulate higher alkaloid concentrations than aboveground organs 19,22 . For jervine this is true for V. maackii but for V. nigrum the ratio is more balanced. There seems to be a tendency to accumulate more alkaloids towards fall in temperate Veratrum species, but our data are based on a single collection in early spring representing a snapshot only and information on the dynamics of alkaloid accumulation in V. maackii and V. nigrum are missing. Specific environmental cues leading to stimulation of alkaloid biosynthesis are unknown but master regulators such as the identified transcription factors likely play a central role in the signalling cascade.
De novo assembly and gene functional classification. We assembled separate transcriptome libraries for root and leaf tissue in V. nigrum and V. maackii to facilitate comparative transcriptomic analysis of tissuespecific and species-dependent gene expression in the context of steroidal alkaloid biosynthesis. This extends the analysis of V. nigrum by Szeliga et al. 10 although they considered three separate tissues (root, leaf and stalk). The inclusion of two aerial tissues refines the analysis but the main relevant distinctions resulting from their work was between the roots and other tissues, so we restricted our analysis to roots vs leaves in order to reduce the number of transcriptome datasets. This simplification allowed us to construct and analyze larger transcriptome libraries compared to the Szeliga study, leading to the isolation of 132,155 and 109,287 unigenes in the V. nigrum root and leaf, respectively ( Supplementary Fig. S5A), as well as 124,802 and 101,035 in the V. maackii root and   Fig. S5A,B). Szeliga and colleagues annotated their transcriptome datasets and found more annotated unigenes in the leaves than the roots 10 . In contrast, we found more annotated unigenes in the root transcriptomes of both species) (Supplementary Table S7). Both studies found the largest numbers of annotations in the Nr database. We were able to annotate 64,329 (66%) unigenes in MR, 43,151 (44%) in ML, 61,428 (63%) in NR and 46,139 (47%) in NL. We also identified more unigenes that could be annotated in KEGG (21,581). The differences between our results and those in the earlier study of V. nigrum may reflect the different sources of plants (wild plants in our study, but laboratory-grown plants in the earlier study 10 ) and/or differences in the timing of plant collection for sampling.
Maximally 12% of the unigenes had homologous matches with individual other species. This value is too low to substantiate close phylogenetic relationship. Relationship within monocotyledons is however clearly indicated. Crop plants are generally well represented in databases and therefore the best hits, Elaeis guineensis and Phoenix dactylifera, should not be overrated. A previous study on V. nigrum 10 notes alignments with Oryza sativa, a monocotyledon crop, which is lacking steroidal alkaloids, too. www.nature.com/scientificreports/ Genes encoding enzymes involved in steroidal alkaloid biosynthesis. The biosynthesis of steroidal alkaloids can be divided into three stages: the upstream MVA pathway that produces 2,3-oxidosqualene, the conversion of cycloartenol to cholesterol, and the conversion of cholesterol to steroidal alkaloids (Fig. 8) 24,25 . The first stage involves six key enzymes (HMGS, HMGR, MK, PMK, MVD and SQE) and we screened the corresponding genes, which were also identified by Szeliga et al. 10 . We found that most of these genes were strongly expressed in roots, with MVD as the only exception (Fig. 6). HMGR is a rate-limiting step in the MVA pathway, catalyzing the transformation of 3-hydroxy-3-methylglutaryl-coenzyme A (HMGR-CoA) into mevalonate (MVA) 26 . HMGR was expressed more strongly in roots than leaves in both species, as confirmed herein and previously 10 by qRT-PCR. The second stage of steroidal alkaloid synthesis involves three genes encoding the enzymes CAS, CPI and CYP51. We found that CPI and CYP51 were expressed at higher levels in the roots of both species than the leaves (Fig. 6). CPI catalyzes the conversion of 31-norcycloartanol to 31-nor-24(25) dihydrolanosterol and CYP51 catalyzes 14α-demethylation during the initial reaction of plant sterol biosynthesis 27 . Both CPI and CYP51 are involved in cholesterol biosynthesis in solanaceous plants 25 . Importantly, cycloartenol is considered the first cyclic intermediate in the biosynthesis of steroidal alkaloids but an alternative pathway has been proposed in Arabidopsis thaliana involving the conversion of 2,3-oxidosqualene into lanosterol. The absence of annotated transcripts encoding lanosterol synthase in our dataset suggests that this alternative path- www.nature.com/scientificreports/ way is not present or is present but silenced in V. nigrum and V. maackii, confirming that steroidal alkaloids are synthesized from cycloartenol in these species 25,28 .

Genes encoding transcription factors involved in steroidal alkaloid biosynthesis
To identify TFs that regulate metabolism in V. maackii and V. nigrum 29 , we screened annotated unigenes representing different TF families and found that the most abundant were MYB, bZIP, bHLH, AP2/ERF and C2H2. The ranking differed slightly in the two species (V. maackii-MYB, C2H2, bZIP, bHLH, WRKY, NAC and AP2/ ERF; V. nigrum-MYB, bZIP, WRKY, C2H2, AP2/ERF, GATA and bHLH) and also differed from the results provided by Szeliga and colleagues (bHLH, C2H2, HD-ZIP, MYB and C3H), where the bHLH family was found only in stalk and root tissues while the C2H2 family was only found in the leaves 10 . In contrast, we found that the bHLH and C2H2 families were expressed in roots and leaves and were upregulated in roots compared to leaves. This discrepancy may reflect the differences in plant sources discussed above and/or the larger number of unigenes discovered in our study. We selected three TF genes for validation by qRT-PCR. The first was AP2/ERF, which was expressed at significantly higher levels in the roots than the leaves of both species and was also expressed at significantly higher levels in V. maackii compared to V. nigrum roots (MR vs NR) whereas there was a nonsignificant interspecific difference in leaves (ML vs NL). The AP2/ERF protein GAME9 is a TF in solanaceous plants that can interact with SIMYC2 to regulate the expression of upstream genes such as C5-SD (encoding D(7)-sterol-C5(6)-destaurase) to regulate the synthesis of glycoside alkaloids 30 . It is possible that AP2/ERF plays a similar role in Veratrum plants. The strong correlation between ERF1A expression and the cyclopamine content of roots and leaves in both species suggests that ERF1A specifically regulates the synthesis of cyclopamine in V. maackii roots. The other two candidates are members of the bHLH family, which is known to regulate the synthesis of flavonoids and alkaloids 31 . We found that bHLH13 and bHLH66 were most strongly upregulated in V. maackii roots, and that bHLH66 was significantly (p < 0.05) more abundant in the roots of both species than the leaves, but especially in V. maackii (Fig. 7). We speculate that both TFs may regulate steroidal alkaloid biosynthesis. The strong correlation between expression of these two TFs and the steroidal alkaloid content indicated that bHLH13 might regulate cyclopamine synthesis whereas bHLH66 might regulate the conversion of cyclopamine to jervine. Few previous studies have addressed the transcriptional regulation of steroidal alkaloid biosynthesis in V. nigrum 10,19 and none (to our knowledge) in V. maackii, so these three candidates should be examined in more detail in future studies to determine their precise roles.
Candidate CYP450s involved in steroidal alkaloid biosynthesis. The CYP450 family is the largest enzyme family in plant metabolism and it plays a key role in the diversification and functional modification of triterpenoid and sterol skeletons 32,33 . We examined seven differentially expressed CYP450 genes that may be involved in steroid alkaloid biosynthesis. Three of them (CYP90B27, CYP90G1 and CYP94N1) catalyze the conversion of cholesterol into downstream intermediates. CYP90B27 converts cholesterol to 22(R)-hydroxylcholesterol, which is oxidized in two steps to form 22-hydroxylcholesterol-26-al by CYP94N1. Another enzyme (GABAT1) converts 22-hydroxylcholesterol-26-al into 22-hydroxyl-26-aminolcholesterol, which is in turn converted into 22-keto-26-aminolcholesterol by CYP90G1 (Fig. 8). We found that the transcripts for CYP90G1, CYP94N and CYP90B27 were significantly (p < 0.05) more abundant in V. maackii and V. nigrum roots than leaves (Fig. 7), which is consistent with earlier reports 19 . The strong correlation between CYP90G1-1 expression and the steroidal alkaloid content of the tissues in both species indicated that CYP90G1-1 might be the key enzyme for steroidal alkaloid biosynthesis in V. maackii roots (Figs. 1 and 7). The root is the main medicinal part of the Veratrum plant due to the accumulation of steroidal alkaloids 34 . It is therefore anticipated that enzymes responsible for the decoration of the alkaloid skeleton would be predominantly expressed in the roots. Three further CYP450 unigenes (CYP76A2, CYP76B6 and CYP76AH1), with strong expression in the roots according to RNA-Seq data, were validated by qRT-PCR (Fig. 7).
In the third stage of steroidal alkaloid synthesis, the enzymes that convert 22-keto-26-aminocholesterol to cyclopamine are generally unknown (Fig. 9). Verazine, the predicted precursor of cyclopamine, is most likely formed by the spontaneous cyclization of 22-keto-26-aminocholesterol, a reactive intermediate. Verazine possibly undergoes 16a-hydroxylation to form ethioline catalyzed by 16DOX and is then reduced to form teinemine and is converted to solanidine in the presence of light. Solanidine is oxidized to form epirubijervine and is reduced in the presence of light to form husukinidine. The latter is then transformed into cyclopamine and further into jervine by additional oxidation reactions, or is reduced to form veratramine 11,[35][36][37][38][39][40][41] . Given the correlation between CYP450 expression and the distribution of steroidal alkaloids, we speculate that these enzymes are also involved in the final part of steroidal alkaloid biosynthesis pathway. The strong correlation between CYP76AH1 expression and the steroidal alkaloid content of the tissues in both species suggests that CYP76AH1 is specifically involved in the final segment of steroidal alkaloid biosynthesis in V. maackii roots (Figs. 1 and 7). The precise functions of the three new candidate genes CYP76A2, CYP76B6 and CYP76AH1 in the final part of the steroidal alkaloid biosynthesis pathway should be evaluated in more detail.
In conclusion, we combined RNA-Seq analysis and the quantification of specific steroidal alkaloids in two tissues from two Veratrum species to facilitate the rapid identification of genes encoding enzymes and transcription factors related to secondary metabolism in a genus of non-model medicinal plants. We identified 97,207 V. nigrum and V. maackii unigenes which were annotated using the Nr, GO, KOG and KECG databases, including 3534 genes that were differentially expressed between tissues or between species. We found that 235 differentially expressed unigenes encoded enzymes with a potential role in steroidal alkaloid biosynthesis. Twenty unigenes putatively involved in steroidal alkaloid synthesis, including 14 known genes, three new candidate genes encoding CYP450s (CYP76A2, CYP76B6 and CYP76AH1), and three new candidate genes encoding TFs (ERF1A, bHLH13 and bHLH66), were rapidly identified by comparative transcriptome analysis. We propose that ERF1A, CYP90G1-1 and CYP76AH1 are specifically involved in the key steps of steroidal alkaloid biosynthesis in V. maackii roots. Our data provide a valuable reference for the study of steroidal alkaloid biosynthetic pathways www.nature.com/scientificreports/ across Veratrum species, helping to identify the genes involved in the main biosynthetic pathway and also those responsible for the species-dependent steroidal alkaloid profiles. Determination of steroidal alkaloid content. We soaked 10 g of lyophilized and powdered sample in 40 ml of 80% ethanol at room temperature for 1 h, and then in 95% ethanol (1:4 w/v) at 80 °C for 2 h with continuous stirring. The solvent was collected by filtration. The sample residue was extracted again in 95% ethanol (1:10 w/v) under the same conditions. This procedure was repeated a third time. The three filtrates were combined, the ethanol was removed by evaporation, and then the extract was dissolved in methanol for subsequent analysis. The concentrations of jervine and cyclopamine in the extract were determined by HPLC-UV using commercial cyclopamine and jervine (LC Laboratories, Woburn, MA, USA) as reference compounds 42 . All samples were analysed in triplicates. SigmaPlot 10.0 (Systat Software Inc., San Jose, USA) was used for statistical analysis and plotting. Significant differences in alkaloid contents within species' root and leaf tissues  RNA extraction, library construction and transcriptome sequencing. Total RNA was extracted from the 12 samples using Trizol Reagent. RNA quality and quantity were assessed using a NanoPhotometer spectrophotometer, and RNA integrity was examined by 1% agarose gel electrophoresis. Library construction and RNA-Seq analysis were carried out by Shanxi Boride Biotechnology using the Illumina HiSeq 2500 platform. Qubit2.0 and Agilent 2100 were used to determine the library concentration and insert size, and qPCR was used for library quantification and quality control. Trinity was used to assemble the transcript sequences.
DEG analysis. DEGs were identified using DESeq 52 . The false discovery rate (FDR) was used to determine the p value threshold. A FDR < 0.01 and a fold change (FC) ≥ 2 were applied as screening criteria to identify DEGs between pairs of samples.

Validation of gene expression by RT-qsPCR analysis.
Total RNA was reverse transcribed into cDNA using the Integrated First Strand cDNA Synthesis Kit (DiNing, Beijing, China). The expression levels of unigenes putatively involved in sterol synthesis were monitored by RT-qPCR using the primer sequences listed in Table S1 (Supplementary Table S8). RT-qPCR analysis was performed with a Roche LightCycler480 Real-time PCR System using SYBR Premix Ex Taq (TaKaRa, Tokyo, Japan). The reaction conditions were as follows: 95 °C for 10 min, and 40 cycles of 95 °C for 15 s, 58 °C for 1 min. The actin (ACT) gene was used as an internal reference, and the relative expression level of each unigene was determined using the 2−ΔΔCt method 53 .

Data availability
All the essential data associated with this manuscript are made available as supplementary data and in online repositories at https:// www. ncbi. nlm. nih. gov/, PRJNA912756.